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Abstract 



This paper discusses loglinear models for assessing differential item functioning (DIF). Loglinear 
and logit models that have been suggested for studying DIF are reviewed, and loglinear formulations 
of the logit models are given. A polynomial loglinear model for assessing DIF is introduced. 
Two examples using the polynomial loglinear model for investigating DIF are discussed. One 
example investigates DIF for a test consisting of both dichotomous and polytomous items. Another 
example illustrates the use of DIF techniques in investigating whether common items are functioning 
differently on two forms of a test in the common item nonequivalent groups equating design. 



There are many procedures a researcher may use to examine the validity of a test, so as 
to prevent bias from inadvertently affecting a sub-group of people that the test is intended for. 
Procedures of this type are part of the process of construct validation. One aspect of investing the 
validity of a test for various groups of people is the investigation of whether bias exists at the item 
level. Item bias is said to exist when an item is functioning differently for two or more groups 
of people, within the population the test is intended for. Item bias manifests itself by differential 
response to an item based on the group a person belongs to, when conditioned on the latent variable 
being measured by the test the item is a part of. The phrase differential item functioning (DIF) has 
been used to refer to this type of differential item performance. Item bias is defined conditioned on 
the latent variable measured by the test as there can be differences in responding to an item among 
groups (termed impact) that reflect legitimate differences between the groups on the latent variable 
measured by the test. When conditioned on the latent variable measured by the test there should be 
no differences between the groups in responding to the item. 

This paper discusses loglinear models used for assessing DIF. The loglinear models allow 
investigation of DIF for dichotomously scored items (items scored correct or incorrect), or polyto- 
mously scored items (items with more than two response categories). A definition of DIF is first 
presented and is f Mowed by a review of contingency table approaches that have been used to investi- 
gate DIF. A polynomial loglinear model for assessing DIF which incorporates the a numerical score 
given to item response categories and matching variable categories is presented. Two examples 
using the polynomial loglinear model for investigating DIF are given. One example investigates 
DIF for a test consisting of both dichotomous and polytomous items. Another example uses DIF 
techniques in investigating whether common items are functioning differently on two forms of a 
test in the common item nonequivalent groups equating design. 

Definition of DIF 

The data used to investigate DIF for a particular item consists of three variables: 1) an item 
response variable (Y), 2) a group variable (V), and 3) a matching variable (Z). It is assumed in 
this paper that the matching variable and item response are categorical rather than continuous (the 
group variabl is also categorical). The data used to investigate DIF for a particular item are then 
contained in an / x J x K table, where there are / categories for the for the item response, J 
groups, and K categories for the matching variable. 

There is no DIF for the item in question if Y and V are conditionally independent given Z. 
Conditional independence of Y and V given Z can be expressed as 

Pr(y = y, V = v | Z = z) = Pr(y = y | Z = z) Pr(V = v \ Z = z) , (1) 

for all y, z, and v. Another way to express the conditional independence of Y and V given Z is 

Pr(Y = y\ Z = z,V = v) = Pr(Y = y\Z=z) (2) 

for all y, z, and v. The equivalence of Equations 1 and 2 is called the Fundamental Lemma of 
Measurement Invariance by Meredith and Millsap (1992). 

True DIF is defined with the matching variable being the latent variable measured by the test 
the item is part of. In practice it is not possible to use this true matching variable. In this paper Z 
is considered to be an observable variable, in which case the condition represented by Equation 2 
is referred to as observed conditional invariance (Millsap and Everson, 1993). 
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Let rtijk be the expected count for item response category i, group ;', and matching variable 
category k. Conditional independence of Y and V given Z is equivalent to the conditional odds 
ratios m M 

%t) = !»W l<i<I,\<j<J (3) 

all being equal to 1 for every k. If any of the conditional odds ratios 0i/(*) differs from 1 then DBF 
is said to exist. Uniform DBF is said to exist when some differ from 1 and for each i and ;', 
6ij( k ) = 6ij(k>) for all k # kf. DBF that is not uniform is called nonuniform DDF. 

Loglinear and Logit Models for Studying DIF 

The saturated loglinear model for the three-way table of item response category by group by 
matching variable category is (Mellenbergh, 1982): 

log(m tf *) = H + *f + + k z + )tf + k* z + kj z + k% z . (4) 

One constraint is placed on each of the parameters kj , kj, k z to identify the model (for example, 
k\ = k\ = kf = 0). Constraints are also placed on the k™ (I + J - \ constraints, for example 
k\Y = Xf, v = 0 for all i, ;), kj k z (/ + K - 1 constraints, for example k\ z = k Y n z = 0 for all 
i, k), and Xj t z (/ + K - 1 constraints, for example = kjf = 0 for all ;', k). There are 
IJ + IK + JK-I-J-K + l constraints placed in the k^ 2 , for example k Y J k z = A.f^ z = 
kjj\ z = 0 for all i, j, k. The model in Equation 2 has no residual degrees of freedom (the model 
fits any data perfectly) because it is a saturated model. 

The log of the conditional odds ratios in Equation 3 for the model in Equation 4 are 

i { m ijk m i+l.j + l,k \ _ ^YV , ->YV ->YV *YV 

\m i+ ij tk mij+i t t/ 

,\YVZ , jKVZ iYVZ iYVZ ,cs 

+ A i+i,>+u - A i+i.y.* ~~ A i.;+i,t • vj; 

The log-odds ratios given in Equation S will generally differ from zero and will not be constant 
across levels of the matching variable category. Thus, the DDF implied by the model in Equation 5 
is nonuniform DDF. 

Mellenbergh (1982) identifies two nonsaturated models based on Equation 4 that are of interest 
in the analysis of DDF — one for uniform DDF and one for no DDF. The uniform DDF model is obtained 
by eliminating the k Y ^ z terms from the model of Equation 4: 

log(m l7t ) = fi + kj + kj + kf + k*f + k YZ + kj k z , (6) 

with the same constraints on the parameters as were indicated for the model in Equation 4. The log 
of the odds ratios in Equation 3 for the model in Equation 6 are 

f SBS^i!* ) = k y + if* _ k yv _ j,™ . (7) 

\mi+\j,kmij+i.k/ 

The log-odds in Equation 6 will in general differ from 1 but do not differ across levels of the 
matching variable. Thus, the model given by Equation 6 implies uniform DDF. 
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The no DIF model presented by Mellenbergh (1982) is obtained by eliminating the k Y J terms 
from the model in Equation 6: 

log(myt) =v + kj + kj + kf + kf k z + kj k z . (8) 

The log of the odds ratios in Equation 3 for the model in Equation 8 will all be zero. Thus, the 
model it. Equation 8 implies no DIP for the item. 

To use the models in Equations 4, 6 and 8, Mellenbergh (1982) suggests that first the model 
in Equation 6 be fit to the data. If this model does not fit the data (based on Pearson or likelihood 
ratio chi-squared statistics) this implies nonuniform DDF for the item. If the model in Equation 6 
does fit the data then the model in Equation 8 is fit to the data. If the difference in the likelihood 
ratio chi-squared statistics for the models in Equations 8 and 6 is significant this implies the item 
exhibits uniform DDF. If the difference in the chi-squared statistics for the models in Equations 8 
and 6 is not significant then the item does not exhibit DIF. 
Logit Models 

Mellenbergh (1982) notes that for dichotomous items logit models equivalent to the loglinear 
models in Equations 4, 6 and 8 for the purposes of studying DDF can be used. In the logit models 
the respond variable is \og(m\jk/mijk), where there are only two categories of item response. 

In the case in which there are numeric scores associated with the matching variable categories 
and/or item response categories this information can be used to create more parsimonious logit (and 
loglinear) models for studying DD\ Let the scores associated with the item response categories be 

r\,r 2 r/, and let the score associated with matching variable categories be s\, s 2 sk . It is 

assumed the categories are arranged such that n < r 2 , . . . , < r/ and s\ < s 2 , » . . , < sk, 

In the case of a dichotomous item response Swaminathan and Rogers ( 1 990) present logit mod- 
els where linear functions of the matching variable score are substituted for the nominal matching 
variable effects in the logit models presented by Mellenbergh (1982). This allows for a nonsatu- 
rated logit model for nonuniform DD 5 (Mellenbergh's logit model for nonuniform DDF is a saturated 
model). The model presented by Swaminathan and Rogers (1990) can be written as 

. I "Mil \ i/ 

log [ )=a 0 + kj + ais k + a 2J s k , (9) 

where there is one constraint put on the kj (for example, k\ = 0), and one constraint is put on the 
a 2j (for example, a 2 \ = 0). The logit model in Equation 9 is equivalent to the following loglinear 
model (Agresti, 1990, pages 152-153) 

10g(l»i;*) = IX + kj + kj + k\ + kj/ + kl Z + ft* + YijSk ■ (10) 

The same constraints are put on the parameters kj, kj, kf, kj k z , and k?/ as were put on the 
corresponding parameters for the model in Equation 4. One constraint is placed on the ft (for 
example, ft = 0) and (/ - 1)(7 - 1) constraints are placed on the Yu (for example, y\j = Yn = 0 
for all i, ;'). For the loglinear model in Equation 10, unlike the logit model in Equation 9, it is 
possible that the number of item response categories could be greater than 2. The log of the odds 
ratios in Equation 3 for the model in Equation 10 are given by 

+(Yij + Yi+\,j+\ - Yi+ij ~ Yi,j+\)Sk ■ (11) 



9 



3 

f) 



The log of the odds ratios in Equation 1 1 are linear functions of the matching variable score and 
therefore represent nonuniform DIE 

Eliminating the y i; terms from the model in Equation 10 gives 

log(m i;t ) = fi + k* + kj + kf + k Y J + kj k z + frs k . (12) 

The log of the odds ratios in Equation 3 for the model in Equation 12 are 

. /«iytmi+U+u\ _.YV >iYV _iYV _ *YV m » 

,0 8 I ~ ~ ) - A «7 + *i+U • V*) 

Xmi+ij.kmij+i.k/ 

Equation 13 is in general different from zero but constant for all values of the matching variable 
score. Consequently, the model in Equation 12 represents uniform DEF. The log of the odds ratios 
in Equations 13 and 7 are identical since the only difference between the models in Equations 12 
and 6 are the interaction terms involving the item response and matching variable which cancel out 
when computing the odds ratio. 

Eliminating the kJj V from the model in Equation 1 2 gives 

log(m l7 t) = m + kj + kj + kf + kj k z + frs k . (14) 

The log of the odds ratios in Equation 3 for the model in Equation 14 are all zero. Consequently, 
the model in Equation 14 represents no DIE 

Comparing the fits of the models in Equations 10 and 12 gives a test for nonuniform DIE and 
comparing the fits of the models in Equations 12 and 14 gives a test of uniform DIE 

For the case in which there are two groups but more than two item response categories Miller 
and Spray (1993) suggest using a logit model v ith group as the response variable. Their model can 
be written as 

log ( — — J = a 0 + ais k + a 2 r, + a 3 s t r, . (15) 
\m l2 */ 

The logit model in Equation 15 can be written as the following loglinear model 

log(m 0 *) = ix + kf + kj +kf+ k]? + p Xj s k + fajn + yjskn . (16) 

The same constraints are put on the parameters kj, kj, k z , and k]f as were put on the corresponding 
parameters for the model in Equation 4. One constraint is put on each of the parameters fiij, faj 
and Yj (for example, f}\ \ = p 2 \ = Y\ = 0)- The log of the odds ratios in Equation 3 for the model 
in Equation 16 are 

log I — '- = 02./+I - hi + (Yj+\ - Yj)sk ■ (17) 

\mi + \J,k m U+\(k/ 

The log-odds ratios in Equation 17 will in general be different from zero and are a linear function of 
the matching variable score. Consequently, nonuniform DIF is implied by the model in Equation 
16. 



Eliminating the terms involving y ; from the model in Equation 16 gives 

log(m, 7 t) =n + kj+ kj k Y J + faj Sk + fan . (18) 

The log of the odds ratios in Equation 3 for the model in Equation 18 are 

The log-odds ratios in Equation 19 will in general differ from zero, but do not vary with the matching 
variable score. Consequently, uniform DDF is implied by the model in Equation 18. 
Eliminating the fa from Equation 18 gives 

log(m, 7 *) = H + k? + kj + kf + Xjj v +fajs k . (20) 

The log of the odds ratios in Equation 3 for the model in Equation 20 will all be zero. Consequently, 
no DDF is implied by Equation 20. 

Comparing the fits of the models in Equations 16 and 18 gives a test for nonuniform DBF, a;. J. 
comparing the fits of the models in Equations 18 and 20 gives a test of uniform DEF. 

An advantage of using the logit form of the models in Equations 9 and IS as opposed to 
the loglinear form of these models is that there are far fewer parameters to estimate in the logit 
formulation. A possible advantage of using the loglinear formulation of the models as opposed to 
the logit formulation is that the loglinear models can be generalized to deal with more than 2 item 
response categories (J > 2) and more than 2 groups (K > 2) without complications of having to 
deal with a polytomous dependent variable (Equation 9 cannot model more than two item response 
categories, and Equation IS cannot model more than two groups). 

The next section presents a loglinear model in which the scores on the item responses and 
matching variable are used in a way that results in far fewer model parameters than for the loglinear 
forms of the logit models, or the loglinear models presented in Equations 4, 6 and 8. 

A Polynomial Loglinear Model for Studying DIF 

Loglinear models with polynomial terms involving test and item scores (polynomial loglinear 
models) have been used in several measurement applications. For example, smoothing of test 
score distributions (Holland and Thayer, 1987; Kolen, 1991), equating (Rosenbaum and Thayer, 
1987; Hanson, 1991; Livingston, 1993; Little and Rubin, 1994), and testing for differences in 
score distributions among groups (Hanson, 1992). This section presents a model for the three-way 
table of item response, group, and matching variable used to investigate DIF that is analogous to 
polynomial loglinear models previously used in the literature. 

In the loglinear models in Equations 4, 6, and 8 the item response variable and matching 
variable are treated as nominal. When there are scores associated with the item response categories 
and the matching variable categories the following loglinear model can be used 

log(m i/A ) = /x + Xj + X> s/ 5« + f^/hhjrf + L E W# ' < 21 > 

g=\ h=\ g=l h=l 

where d\ < K, di < I. As n Equation 4 a constraint is put on the kj . There are no constrains 
put on the /3 parameters. A subset of the Yghj are assumed to be nonzero, and the rest are assumed 
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to be zero. If it is assumed Yt'h*p # 0 for particular values g*, h* and j* then it is also assumed 
that Yt'h'j' # 0 for all / ^ y*. Consequently, the number of Yghj ^ 0 is for some positive 
integer d$. The value of d% is equal to the number of the d\ x dt possible Yghj in each group that 
are specified to be nonzero. The value of di is not directly related to the values of d\ and dj, for 
example, d* is not the sum of d\ and d-i. Note that the models in Equations 10 and 16 are not special 
cases of the model in Equation 21. 

The log of the conditional odds ratios in Equation 3 for the model in Equation 21 is 

log ( ""' l,UW ) = ttfwt. - fcv> tf« " rf ] 

+ E E - * »> tf+i - r *i • C22> 

*=1 /l=l 

Equation 22 represents nonuniform DIE The DIF given in Equation 22 is constrained relative to the 
DIF given by the saturated loglinear model (Equation 4). The model in Equation 2 1 is a nonsaturated 
loglinear model that allows for nonuniform DIF. Comparing Equation 22 to Equations 1 1 and 17 it 
is seen that the loglinear model in Equation 21 allows for more complicated forms of DIF than the 
models in Equations 10 and 16. In particular, the model in Equation 21 allows DIF which varies 
across adjacent item response categories. 

The constrained version of the model given in Equation 21 which implies uniform DIF is 

d\ di d\ di 

log(«</*) = + + E P\ A + E fciY r * + E E WW • ( 23 ) 

g=\ A=l g=l h=l 

The model in Equation 23 differs from the model in Equation 22 by not having the Ygh parameters 
differ for the different groups. The difference in the number of parameters between the models 
in Equations 23 and 21 is d 3 (J - 1). The log of the conditional odds ratios in Equation 3 for the 
model in Equation 23 is 

log ( y mi T u ) = - *•»> - r *> < 24 > 

\«i+l,i,fc l «<,/+l,k/ t=\ 
The constrained version of the model given in Equation 23 which implies no DIF is 

d\ di d\ di 

\og(m ijk ) = ii + kj + 52 + E ^ + E E y» hS ' r i - ( 25 > 

g=\ h=\ g=\ h=l 

The model in Equation 25 differs from the model in Equation 23 by not having the fa parameters 
differ for the different groups. The difference in the number of parameters for the models given 
in Equations 25 and 23 is d2(J - 1). The log of the odds ratios in Equation 3 for the model in 
Equation 25 are all zero. 

The likelihood ratio chi-squared statistics for the models in Equations 21 and 23 can be used to 
test for nonuniform DIF. Under the hypothesis that the model in Equation 23 holds, the difference in 
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the likelihood ratio chi-squared statistics between models 21 and 23 is asymptotically distributed as 
a chi-square random variable with ds(J - 1) degrees of freedom. For a level of significance p, the 
hypothesis that the model in Equation 23 holds (uniform DIF) versus the alternative hypothesis that 
the model in Equation 21 holds (nonuniform DIF) is rejected if the difference in the likelihood ratio 
chi-square statistics of the models in Equations 21 and 22 is greater then the upper /? percentage 
point for the chi-square distribution with d%{J — 1) degrees of freedom. 

To test for uniform DDF the likelihood ratio chi-squar'd statistics for the models in Equations 
23 and 25 can be used. Under the hypothesis that the model in Equation 25 holds, the difference in 
the likelihood ratio chi-squared statistics between model?, 23 and 25 is asymptotically distributed 
as a chi-square random variable with di(J — 1) degrees of freedom. For a level of significance p, 
the hypothesis that the model in Equation 25 holds (no DIF) versus the alternative hypothesis that 
the model in Equation 23 holds (uniform DIF) is rejected if the difference in the likelihood ratio 
chi-square statistics of the models in Equations 23 and 25 is greater then the upper p percentage 
point for the chi-square distribution with di(J - 1) degrees of freedom. 

Choosing a Model 

Using the models in Equations 21, 23 and 25 involves choosing values for d\ and di, and 
choosing which of the Yghj to make nonzero. The values of d\,d2, and which Yghj to make nonzero 
are chosen based on the model in Equation 21, and are used for the models in Equations 23 and 25 
in testing for uniform and nonuniform DIF. 

A model selection procedure presented by Haberman ( 1 974) can be used for choosing a model 
in the form of Equation 2 1 from a set of possible models (different values of d\ , di and nonzero Yghj)- 
To apply Haberman's (1974) procedure it is assumed that a set of q models have been identified 
(Mi, A#2, .... M q ) where model M,_i is nested within model M,-, i = 2, . . . , (Mi is the simplest 
model, and M q is the most complex model). If G? is the likelihood ratio chi-square statistic for model 
M< then for i = 2, . . . , q , G?_, - G? is the likelihood ratio statistic for testing the null hypothesis 
Hi-i versus the alternative hypothesis H h wheie H t is the hypothesis that model M, holds. If the 
hypothesis Hi* is true then the statistics G?_, - G? for i = q y q — 1 , . . . , i* + 1 are asymptotically 
independent and have chi-square distributions with u>,- degrees of freedom, where u>,- is equal to the 
difference in the number of parameters of models M, and M<_i. For a level of significance p, with 

p* = i_(i_p)i/(?-D f the probability that G?_,-G?,i -q,q-\ exceeds C, the upper 

p* percentage point for the chi-square distribution with u>i degrees of freedom is asymptotically no 

greater than p. A simultaneous test of the hypothesis H it i = q — \,q — 2 1, is to reject all 

hypotheses Hi such that i < i\ where i' is the largest j s ich that G?_, - G? > C. With a specified 
value of p, this hypothesis testing procedure would allow one to eliminate from consideration 
models M j( i < i'. It gives no guidance for choosing from among the models M ( , i > i', although 
typically model M,< (the simplest model) is chosen. Smaller values of p would would make it 
harder to rej set the null hypothesis of the simpler model and therefore favor the selection of simpler 
models. 

The selection procedure of Haberman (1974) requires that the models being considered form 
a nested sequence. Especially in the case of non-dichotomous items it is possible that the set of 
models under consideration do not form a nested sequence. In that case the Haberman model 
selection procedure is not directly applicable. A series of model comparisons could be performed, 
but the tests would no longer be independent and the error rate given by the Haberman procedure 
will no longer be accurate. The first example presented in the next section provides an example of 

7 io 



using a modification of the Haberman procedure to select a model for polytomous items. 

In applied settings it may not be realistic to use a model selection procedure for each item. 
A more realistic procedure may be to select a common model for all items with a specific number 
of score categories, perhaps based on past experience. This procedure is used in the second of the 
following examples. 
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Examples 

Two examples of applying the polynomial loglinear model are presented in this section. First, 
the polynomial loglinear model is applied to the 27 item data set analyzed in Miller and Spray ( 1993). 
The results from the polynomial loglinear model are compared to results using the logit model of 
Miller and Spray (1993). The second example consists of applying the polynomial loglinear model 
to investigate DIF for common items in a common-item nonequivalent groups equating design. 
The goal is to examine if any of the common items function differently on the two test dates (the 
common items are embedded within different test forms on the two test dates). 

Matching Variable 

For both examples the matching variable will be a test score consisting of the sum of the item 
scores. The issue discussed in this section is whether to use as the matching variable the sum of the 
item scores including the studied item, or the sum of the item scores excluding the studied item. 

Several authors have used theoretical justifications to conclude that a matching variable that 
is the sum of item scores should include the studied item (Holland and Thayer, 1988; Zwick, 1990; 
Meredith and Millsap, 1992). If there is a latent variable under which local independence holds 
for the item scores then a test score which excludes the studied item score will be conditionally 
independent of the studied item score given the latent variable. Under this condition and some 
other fairly mild conditions, Meredith and Millsap (1992) show that OIF will appear when the 
test score excluding the studied item score is used as a matching variable even if there is no DIF 
in either the studied item score or the test score when the latent variable is used as the matching 
variable. Under these conditions even though there is no DIF when using the latent variable as 
the matching variable, DIF will be observed when the test score excluding the studied item score 
is used as the matching variable. It is only under very special conditions that using the test score 
including the studied item score will alleviate this problem (e.g., the Rasch model holds f it the item 
responses, Holland and Thayer, 1988). Consequently, theoretical analysis suggests the problem of 
DIF being detected when using an observed matching variable when no DIF exists using the ideal 
latent matching variable will occur in many practical situations whether or not the test score used 
for the observed matching variable includes or excludes the studied item score. 

In the cases considered in this paper the item category scores for all items are 0, 1 ,...,/- 1 , 
where there / item response categories. Let m,y* be the expected count corresponding to item 
response category i, group and matching variable category k, where the matching variable is 
the total test score excluding the score for the studied item. Let m* jk be the expected counts in the 
three-way table where the matching variable is the total test score including the score for the studied 
item. If there are / item response categories, J groups, and K score categories for the test score 
excluding the studied item, then the table containing the expected counts m,y* has / x J x K cells 
and the table containing the expected counts m* jk has / x7x (£ + /- !) cells. Tl e expected 
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counts m*j k can be written in terms of the expected counts my* as 



m 




m/,A*-i+i i <k < K + i - I 
0 k < i, k > K + i - 1 . 



(26) 



For example, consider group 1 and item response category 2. Assuming item response categories 
are ordered by the category scores, then item response category 2 corresponds to an item score of 
1. From Equation 26, m$ u = m 2 ,i,k-i for 2 < k < K + 1. This is because any examinee who 
obtained a score of 1 on the item would have a test score including the item that was one greater than 
their test score excluding *he item. For k = I, Equation 26 gives m\ n = 0 since if an examinee 
obtained a score of 1 on the item, the test score including this item could not be zero. In addition, 
for k > K + 1 Equation 26 gives m\\ k - 0. This is because if an examinee obtained a score of 1 
on the item, their maximum test score including the item corresponds to matching variable score 
category K + 1 (one mure than the maximum test score excluding the item). 

Equation 26 shows that the expected counts in the table corresponding to the test score including 
the studied item can be written in terms of the expected counts in the table corresponding to test score 
excluding the studied item. Even though there are I J (/ - 1) more cells in the table corresponding 
to the test score including the studied item that table will have I J (I - 1) cells with structural 
zeros. Including the studied item score in the test score creates a table with more cells but no more 
information. A loglinear model fit to the table corresponding to the test score excluding the studied 
item would give the same results as a loglinear model fit to the table corresponding to the test score 
including the studied item as long as the structural zeros in tto* table were taken into account when 
fitting the model. Different results would be obtained if the model fit to the table corresponding to 
tht test score including the studied item allowed all the cells in the table to have non-zero expected 
counts (which would result in non-zero fitted counts for cells in which the fitted count by definition 
should be zero). 

Consequently, in the present setting the estimated counts and model fits would be the same 
whether the studied item is included in the test score or not (as long as structural zeros are preserved 
when including the item score in the test score). The matching variable used in the examples is the 
t :st score excluding the item score. 



The first example uses the same data used for the example in Miller and Spray (1993). The data 
consists of responses of 1976 examinees to a 27 item experimental mathematics performance test. 
The test consisted of 12 multiple-choice items (items 1 through 12), 9 gridded-responsc items (items 
13 through 21), and 6 open-ended items (items 22 through 27). The multiple-choice and gridded- 
response items were scored dichotomously (one for a correct response and zero for an incorrect 
response). The scores on the open-ended items were 0, 1, 2, .... k, where k = 3,3, 4, 4, 5, 6 for 
items 22 through 27, respectively. DIF was investigated for males versus females. There were 
1005 male and 971 female examinees in the data set. One male examinee included in the data 
analyzed by Miller and Spray (1993) was dropped from the analyses reported here because all of 
his responses to the polytomous items were missing. 

The first step in fitting the loglinear models given in Equations 21, 23, and 25 is determining 
the number of parameters to use in the models (the values of d\, d%, and ^3). A modified version 
of the Haberman procedure described above is used to select values of d\, d% and d^. Models are 
considered with values of d\ ranging from 1 to 6, values of d% ranging from 1 to the maximum score 
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on the item (/ - 1), and </ 3 ranging from 1 to 5. The five interaction parameters considered were 
Yiu yi2. Xii. yi3» and y$\- A model with dz = / would include only the first / of these interaction 
parameters. For example, if rf 3 = 1 then the only interaction parameter in the model would be y\\- 
If di = 3, then the three interaction parameters in the model would be yn, ;i2, and yi\> 

For the dichotomously scored items (items 1 through 21) the only possible value of d 2 is i, 
and d$ was set equal 1. Consequently, for these items choosing a model involves choosing a value 
ofd\ . For the dichotomous items the Haberman procedure was applied for the sequence of models 
corresponding to d\ = 1, 2, .... 6. The overall level of significance was chosen to be .01, so the 
value of p* used for each individual test of the nested models was 1 - (1 - .01) 1/5 = .002. 

For the polytomous items (items 22 through 27) values must be chosen for d\ , d 2 and d 3 rather 
than for just d\. Instead than specifying one sequence of nested models, a sequence of nested 
models was specified separately for d\, d 2 , and d 3 . The Haberman model selection procedure was 
applied three times — once for di, once for <f 2 ,and once for d\ . An error rate of 003 was chosen 
for each of the three separate Haberm n procedures resulting in in an overall error rate of at most 
.009 (by the Bonferroni inequality) for the three procedures taken as a whole. When selecting d$, 
d\ and d 2 were set equal to their maximum values (6 for d„ and / - 1 for dz). The Haberman 
procedure was applied to a sequence of models given by di = 1, 2, .... 5. The overall level of 
significance chosen was .003, so the value of p* used for each individual test of the nested models 
was 1 - (1 - .003) */ 4 = .00075. 

When selecting d 2 , d\ was set equal to 6 and d$ was set equal to the value determined in the first 

step. The Haberman procedure was applied to a sequence of models given by d 2 = 1,2 / - 1 . 

For an overall level of significance of .003, the value of p* used for each of the individual tests of 
the nested models was 1 - (1 - .003) 1/(/_2) . 

When selecting d\, the values of d 2 and d% were set equal to the values chosen in the previous 
steps. The Haberman procedure was applied to a sequence of models given by d\ = 1, 2, . . . , 6. 
For an overall level of significance of .003, the value of p* used for each of the individual tests of 
the nested models was 1 - (1 - .003) 1/s = .0006. 

Examples of applying the model selection procedure to items 7 and 23 are presented in Table 
1 . The top part of Table 1 gives results for item 7. A nested sequence of six models were compared. 
Chi-square statistics for comparing adjacent models and their degrees of freedom and p- values are 
-esented in the last three columns. The first two models to be compared are those given in the first 
two rows. For both these models d 2 = 1 and di = 1. The model in the first row has d\ = 6 and 
the model in the second row has d\ = 5. The chi-square statistic for testing the null hypothesis 
of the model with d\ — 5 against the alternative hypothesis of the model with d\ — 6 is given as 
.544. The value of p* chosen for each of the tests of consecutive models is .002. Consequently, for 
the first test (d\ = 5 versus d\ — 6) the null hypothesis of the simpler model is not rejected. The 
first test that is significant at the .002 level is the test for d\ = 2 versus d\ = 3. Consequently, the 
model selected is d\ =3 (this is indicated in the table by the value of p* being next to the model 
with d\ = 3). 

For item 23 separate selection procedures were used for d^, di, and d\. For item 23 the first 
five lines correspond to models with five different values of d^. For these models 4\ and d 2 were 
fixed at their maximum values of 6 and 3, respectively. The level of significance chosen for the 
tests of consecutive models was .00075. The first model comparison was for d^ = 4 versus d% = 5. 
The chi-square statistic for this test is 21.34 with 2 degrees of freedom which is significant at the 

10 

ERIC 1J 



.00075 level. Consequently, the simpler model with di = 4 is rejected, and the value of di = 5 
is chosen. Next, three models corresponding to three values of di are compared (with di=6 and 
di = 5). In this case a value of di = 3 is selected. Finally, there are six models corresponding to 
d\ = 6, 5, .... 1 (with <*2 = 3 and di = 5). A value of d\ = 4 is chosen. Thus, for item 23 the 
model with d\ = 4, di = 3, and di = 5 is used to test for uniform and nonuniform DIE 

For all dichotomous items, except items 7 and 13, d\ was selected to equal 4, while for 
dichotomous items 7 and 13 di was selected to be 3. The models chosen for the polytomous items 
are given in Table 2. For all the polytomous items, d\ = 4 and di = I — 1 (the maximum score 
on the item). Varying numbers of interactions terms were chosen for the polytomous items. For 
example, for polytomous item 24 only one interaction parameter between item response and score 
level was chosen. For polytomous item 27, four interaction parameters were chosen. 

The values of d\, di, and di chosen were used in fitting the models in Equations 21, 23, and 
25 for each item. Likelihood ratio chi-square statistics for testing for uniform and nonuniform DIF 
were computed. When reporting results, three levels of significance are used — 0.05, 0.01 and 0.05 
/ 27 = 0.00185 (a Bonferroni adjustment). 

Significance levels for tests of uniform and nonuniform DIF are shown in Table 3 for all items. 
For uniform DIF, thirteen items reached the .05 level of significance , ten items reached the .01 
level of significance, and eight items reached the .00185 level of significance. For non-uniform 
DIF, two items shows significant nonuniform DIF at the .05 level, and one of these items (item 15) 
did not show significant uniform DIF. For Item 15 nonuniform DIF was indicated, but the bias was 
balanced to cancel the effects against each group out and this item exhibited no uniform DIF. 

The logistic discriminant function analysis (LDFA) results using the models in Equations 16, 
18 and 20 are presented in Table 4. The results in Table 4 differ slightly from the results in Table 
4 of Miller and Spray (1993) because the analysis reported here used a matching variable that did 
not include the studied item, whereas Miller and Spray (1993) used a matching variable that did 
include the studied item. 

There is little difference between the LDFA and polynomial loglinear models in terms of 
the tests for uniform DIF. For the polynomial loglinear model the test for nonuniform DIF was 
significant at the .05 level of significance for only items 15 and 26. For the LDFA model the test 
for nonuniform DIF was significant for ten items at the .05 level of significance, for five items at 
the .01 level of significance , and for two items at the .00185 level of significance. For this data the 
LDFA model indicated more nonuniform DIF than the polynomial loglinear model. 

The log of the odds ratios in Equation 3 for the observed counts and fitted counts for the 
polynomial loglinear and LDFA nonuniform DBF models were graphed and compared to explore 
the DIF trends across score levels. Graphs were created for dichotomous items in which significant 
nonuniform DIF was indicated for the LDFA model but not for the polynomial log-linear model. 

The graph of the log-odds as a function of test score (excluding the studied item) is presented 
in Figure 1 for item 16. In all figures group 0 is males and group 1 is females. 

As can be seen from the graph in Figure 1, the observed log-odds lie primarily above the line 
of no DIF. The line of no DIF is a horizontal line perpendicular to the y-axis (or log-odds ratio axis) 
at 0.0. If an item had no DIF, the observed data would approximate this line. The observed data 
shows scatter, but it does not appear to have any trend. As can also be seen from the graph, the 
fitted log-odds ratios for the LDFA model have more slope than the fitted log-odds ratios for the 
polynomial loglinear model. The polynomial loglinear model has very little slope, and significant 
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nonuniform DIF was not indicated for this model. For this item the polynomial loglinear model 
appears to fit the observed data more closely than the LDFA model. 

Log-odds plots for items 4, 5, 6, 8, and 20 are presented in Figures 2 through 6, respectively. 
Like item 16, significant nonuniform DIF was indicated by the LDFA model for these items, but 
not by the polynomial loglinear model. For all items except item 8 there does not' appear to be 
much of a trend in the log-odds ratios, which is more consistent with the fitted odds-ratios for the 
polynomial loglinear model. For item 8 there may be some slight trend in the observed log-odds 
ratios, which is more consistent with the fitted odds-ratios for 'he LDFA model. 

There was only one item, item 15, for which the test for nonuniform DIF was significant for 
the polynomial loglinear model but not for the LDFA model. For one other item, item 7, the test 
for nonuniform DIF was very near to being significant for the loglinear polynomial model but not 
for the LDFA model. Graphs of the odds ratios for items 15 and 7 are given in Figures 7 and 8. 

The results in Figures 7 and 8 are similar. The fitted log-odds ratios for the polynomial loglinear 
model show a positive slope. There is only a very slight trend in the fitted log-odds ratios for the 
LDFA model, and this trend is in the opposite direction of the trend in the fitted log-odds ratios for 
the polynomial loglinear model. To the extent there is any trend in the observed log-odds ratios, that 
trend appears to be in the same direction as the trend of the fitted log-odds ratios for the polynomial 
loglinear model. 

The test for nonuniform DIF was significant for the last 5 polytomous items using the LDFA 
model, whereas only for item 26 was the test for nonuniform DIF significant for the polynomial 
loglinear model. For the polytomous items there would be multiple log-odds ratio plots (one plot 
for each pair of adjacent item response categories) for each item analogous to the single plots for 
the dichotomous items given in Figures 1 through 8. Because of the sparseness of the data it is not 
practical to plot the the multiple observed log-odds as a function of matching variable score level 
for the polytomous items. 

Another way to graphically display the results for the polytomous item (that can also be used 
for the dichotomous items) is to plot the means of the conditional distributions of item response 
given matching variable score. Figure 9 gives a plot of the observed conditional item score means 
and fitted conditional item score means using trie polynomial loglinear model of uniform DIF for 
item 23. The scores on item 23 range from 0 to 3. The lines in Figure 9 give the mean item score 
as a function of matching variable score. The conditional means are presented separately for males 
and females. If there were no DIF the conditional means for males and females would be identical. 
Figure 10 gives the observed and fitted conditional means using the polynomial loglinear model 
of nonuniform DIF for item 23. The observed conditional means for females are generally below 
those for males in the middle of the niching score range. The model for uniform DIF (Figure 9) 
appears to fit the data well. This is consistent with the results in Table 3 which indicated significant 
uniform DIF, but not significant nonuniform DIF for item 23. 

Observed and fitted conditional means using the polynomial loglinear model for uniform DIF 
are presented in Figure 1 1 for item 26. The plot of observed and fitted conditio .<al means using 
the polynomial loglinear model for nonuniform DIF is presented in Figure 12. The model for 
nonuniform DIF appears to provide a better fit to the data than the model of uniform DIF. This is 
consistent with the results in Table 3 which indicated significant nonuniform DIF. For matching 
variable scores above around 1 9 the means for males are higher than the means for females, whereas 
for matching variable scores below 19 the opposite is the case. The difference in means between 
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males and females is larger for scores above 19. 
Example 2 

The second example consists of applying the polynomial log-linear model to investigate DIF 
for common items in a common-item nonequivalent groups equating design. In the common-item 
nonequivalent groups equating design the forms of a test to be equated are administered to different 
groups along with a common set of items. The common items may be included in the score 
reported to the examinee (an internal set of common items) or not included in the score reported to 
the examinee (an external set of eonv jn items). 

For the common item equating to provide valid results it is important that the common items 
function the same in both test forms. A common item could function differently on two forms 
due to the different contexts in which it was embedded, or the different times it was administered 
(the topic of the item might be more salient at one time versus another). One definition of the 
items functioning the same for both forms is that there is no association between item response and 
form on which the item was administered when conditioned on the score for all common items. 
DIF analysis can be used to assess whether this association exists or not. Instead of the focal and 
reference groups being majority and minority or male and female as is (typical in DIF studies), the 
groups here are the two forms in which the common item set is embedded and the two test dates on 
which these two forms were administered. 

The data used were from a 150 item professional certification test. The focus was on the 1993 
form (administered in 1993). The 1993 form had a link to the 1992 form (administered in 1992 
with 37 internal common items to the 1993 form) and the 1991 form (administered in 1991 with 38 
internal common items to the 1993 form). There were 1521 examinees who took the 1991 form, 
1450 examinees who took the 1992 form, and 1375 examinees who took the 1993 form. 

For the 1993/1991 data, d\ was set equal to four for each studied item after roughly examining 
how many parameters would be needed to model each item. For each item the likelihood ratio 
chi-square for testing the nonuniform model versus the saturated model (goodness of fit test) was 
not significant at the .05 level of significance. 

Three different significance levels were used for the analysis — 0.05, 0.01, and 0.05 / 38 = 
0.0013158 (a Bonferroni adjustment). For uniform DIF, a total of thirteen items were found to be 
significant at the .05 level and beyond, ten were significant at the .01 level and beyond, and five 
were significant at the 0.001 3 level of significance. The results for all items are presented in Table 
5. 

For non-uniform DIF, only two items were significant at the 0.01 level. Both items that 
exhibited non-uniform DIF also exhibited uniform DIF at the 0.0013 level of significance. 

All items that showed significant DIF at the 0.01 level of significance were examined by 
looking at the actual content of the items (the item stems) and their responses (alternatives). Two 
of the items for which uniform DIF was indicated at the 0.0013 level of significance had syntactic 
differences between the two forms. For one of the items, "... NOT..." (all capital letters) was used 
in the stem while for that item on the other form "...not..." (underlined lower case) was used. For 
the other item, the word "vs." in the stem was written with a period at the end of the abbreviation 
on one form, and on the other form it was just written as "vs" without a period at the end. No 
other noticeable syntactic differences were found for the other items which had significance levels 
less 0.01. There should be no syntactic differences in an item between forms (every common item 
should be absolutely identical between forms). These differences were missed by test development 
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staff who checked for the items being identical on the two form. 

Items that are functioning differently on the two test forms may have an adverse effect on the 
equating. To study the effect of the inclusion of the two items with syntactic differences on equating 
the equating analysis was re-done excluding those two common items. Before proceeding, it was 
necessary to determine if these two syntactically incorrect items need to stay in the common-item 
pool for the sake of content specifications. The common-item pool should be a mini-version of the 
test, and must be balanced in its range of content as similarly as possible to the entire set of items 
used to compute the score reported to examinees. In this particular test all items fell into one of 
foui content areas. Two of the content areas had large numbers of items; the other two content areas 
had small numbers of items. The items which exhibited the large amount of uniform DIF, and had 
syntactic problems, came from content areas in which they could be removed, and no harm would 
be done to the balance of content in the common-item set. 

The equating was re-done excluding these two items as common items. In the recomputed 
equating the two items are considered non-common items. Tucker and Levine Observed Score 
equating functions were computed (Kolen and Brennan, 1987). As an indication of the difference 
in the equatings the number of examinees whose scale scores would change if the two items were 
not used as common items was calculated (the scale scores range from 0 to ISO). For the Tucker 
equating 23 out of the 1375 examinees who took the new form would have a score change of 1 
point (either increase or decrease), and for the Levine equating scores for 310 examinees would 
have changed by one point. Given that the maximum score change is one point on a 151 point scale 
and the number of examinees with a one point change is not large, it is concluded that not including 
these two items as common items does not have an important effect on the equating results. 

For the 1993/1992 data, d\ was also set equal to four for each studied item. For each item the 
likelihood ratio chi-square for testing the nonuniform model versus the saturated model (goodness 
of fit test) was not significant at the .05 level of significance. 

Again, three different significance levels were used for the analysis — 0.05, 0.01, and 0.05 
/ 37 = 0.0013514 (a Bonferroni adjustment). For uniform DIF, a total of eight items were found 
to be significant at the 0.05 level and beyond, four were significant at the 0.01 level and beyond, 
and none were significant at the 0.0014 level of significance. For non-uniform DIF, only two items 
were significant at the 0.01 level. Neither of these items exhibited significant nonuniform DIF. The 
results for the 1993/1992 equating are presented in Table 6. 

As in the 1993/1991 equating study, all items that showed significant DIF at the .01 level of 
significance (and beyond) were examined by looking at the actual content of the items (the item 
stems) and their responses (alternatives). None of the items manifested any apparent reasons why 
they should perform differently in the two different forms. 

The results indicated more DIF for the 1993/1991 equating items than for the 1993/1992 
equating items. A plausible ad-hoc explanation is that since some of the items had somewhat political 
and time related content, there would be less bias when the time interval between administrations 
was smaller as any effect of time related content would be reduced. 

Discussion 

The focus of the investigation of DIF is the conditional association between item response and 
group given a matching variable. This association can be modeled by loglinear models, or logit 
models using either the item response or group as the dependent variable. 

Loglinear and logit models for studying DIF were presented and loglinear formulations of the 
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logit models were given. A polynomial loglinear model using scores on the matching variable and 
item responses was introduced. This model contains far fewer parameters than loglinear models that 
treat the matching variable and item response as nominal. Unlike the logit models, the polynomial 
loglinear model is generalized to the case of more than two item responses and more than two 
groups. An advantage of the polynomial loglinear model is that it provides a non-saturated model 
of nonuniform DIF that is able to detect more complex forms of DIF than logit models that have 
been suggested (Equation 22 versus Equations 1 1 and 17), although it is possible that the logit 
models could be expanded to model more complex forms of DIE 

An example of using the polynomial loglinear model to study DIF was given using data from 
Miller and Spray ( 1 993). The results of the polynomial loglinear model were compared to the LDFA 
method given by Miller and Spray (1993). The methods were fairly consistent in their identification 
of uniform DIF. The LDFA model indicated more nonuniform DIF in the items than the polynomial 
loglinear model. Examination of graphs of the conditional log-odds ratios for item where the LDFA 
and loglinear models gave different indications of nonuniform DIF indicated that the polynomial 
log-linear model appeared to fit better than the LDFA model for most items, although the LDFA 
model did appear to fit better for one of the items. 

The results presented for the polynomial loglinear and LDFA models cannot be used to conclude 
which model is best for the data used, or even if either model is providing accurate results, since 
the amount of DIF in the items is unknown. The purpose here was to provide an example of the 
application of the polynomial loglinear model and a comparison of the results to those obtained 
from the LDFA model. Simulation could be used to study the the absolute and relative performance 
of the methods. 

A second example involved using the polynomial loglinear model to study DIF in common 
equating items. In this application of DIF techniques items are studied for differential functioning 
across different forms in which they are embedded and different test dates on which those forms 
are given. In common item equating it is important that the common items function the same in 
the forms being equated, and DIF techniques offer a useful set of tools for studying this question. 
In the example presented, two items for which the test for uniform DIF was significant were found 
to have syntactic differences between the forms that were undetected by visual examination of the 
forms by test development staff. 

It would be useful to develop confidence bands as in Miller and Spray (1993) for use in 
graphical displays such as those displayed in the figures. The usefulness of confidence bands is 
demonstrated in Miller and Spay (1993) where they are used to identify regions of the matching 
variable for which DIF is present. Confidence bands and significance tests both have the property 
that smaller amounts of DIF can be detected as significant with larger samples sizes. This can be a 
problem when the DIF detected as statistically significant is not practically important. 
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Table 2 

Polynomial Loglinear Models Used For Miller/Spray Open-Ended Items Dataset 



Number of Parameters 



item d1 d2 d3 



22 4 3 4 

23 4 3 5 

24 4 4 1 

25 4 4 2 

26 4 4 1 

27 4 6 4 
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23 



19 



Table 3 

Polynomial Loglinear Model for the Miller/Spray dataset 



Uniform DIF Non-Uniform DIF 



d.f. chi-square p< d.f. chi-sjuare p < 

Multiple- 
Choice 



1 

1 




10 7RQ 


O 001 0*4 *** 

w.ww 1 WW 




0 022 


0 88316 

w.Vvw IW 


9 
£. 




9ft 1ft1 


0 OOOOO *** 

w.ww www 


1 


1 777 


0 18251 

W. 1 Vfc w 1 


\ 

a 




n 190 

w. 1 fcW 


0 72931 

w. 1 £9w 1 




0 268 


0 60437 


A 




•> 917 


0 01499 * 

w.w l*T99 




1 467 


0 22588 


e 

w 




31 R7R 
w 1 .O ru 


0 00000 *** 

w. WWW WW 




0 687 


0 40735 


6 


1 


0.763 


0.38239 


1 


0.676 


0.41106 


7 




11.277 


0.00078 *** 


J 


3.839 


0.05007 


n 

V 


] 




0 00000 *** 

W.W WW WW 




1 223 


0 26879 

W • « w 


Q 




9fi OQR 


o nnnnn *** 

W.W WW WW 


1 


0 207 

W.&W f 


0 64883 


10 
1 V 




1 96S 


0 26073 




0 262 

V . fawfa 


0 P0852 


1 1 




0 179 


0 67220 

W.W t dfatfeW 




0 016 

W.W IW 


0 89998 


12 

1 Cb 




17 280 


0 00003 *** 

W • WW WW W 




0.039 


0 84399 

w • %^ » www 


Gridded 














13 




0.025 


0.87368 




0.001 


0.97814 


14 


J 


0.030 


0.86241 




1.212 


0.27094 


15 




1.278 


0.25820 




6.260 


0 01235 


16 




8.061 


0.00452 ** 




0.155 


0.69348 


17 




3.824 


0.05053 




1.359 


0.24364 


18 




0.018 


0.89427 




2.423 


0.11958 


19 




3.256 


0.07114 




1.008 


0.31529 


20 




6.558 


0.01044 * 




0.478 


0.48936 


21 


i 


0.145 


0.70332 




0.625 


0.42913 


Open-Ended 














22 


3 


1.772 


0.62106 


4 


7.298 


0.12097 


23 


3 


18.229 


0.00039 *** 


5 


6.947 


0.22463 


24 


4 


5.555 


0.23491 


1 


2.208 


0.13732 


25 


4 


6.208 


0.18412 


2 


0.830 


0.66045 


26 


4 


15.057 


0.00458 " 


1 


6.098 


0.01354 


27 


6 


13.382 


0.03735 * 


4 


7.318 


0.12002 


* <= .05, ** <= 


.01, 


*** <= (.05 / 27) 











Note- all dichtomous items were fit with d, = 4, except for items 7 and 13 where d, = 3 
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Table 4 

Logistic Discriminant Function Analysis for Miller/SprayDataset 



Uniform DIF 



Non-Uniform DIF 



Multiple- 
Choice 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 

Gridded 
13 
14 
15 
16 
17 
18 
19 
20 
21 

Open-End ed 
22 
23 
24 
25 
26 
27 



d.f. Chi-square 



d.f. Chi-so/jare 



9.598 
24.595 
0.059 
5.322 
31.648 
0.825 
13.314 
43.711 
27.397 
1.122 
0.143 
19.614 



0.094 
0.009 
1.202 
6.471 
3.809 
0.012 
5.216 
3.489 
0.076 



1.683 
17.086 
3.449 
1.468 
6.813 
5.214 



0.00195 ** 
0.00000 ** 
0.80843 
0.02106 * 
0.00000 ** 
0.36387 
0.00026 " 
0.00000 " 
0.00000 " 
0.28955 
0.70578 
0.00001 - 



0.75918 

0.92439 

0.27299 

0.01097 ' 

0.05098 

0.91453 

0.02238 

0.06177 

0.78290 



0.19457 
0.00004 
0.06329 
0.22572 
0.00905 
0.02241 



1.389 

1.155 

2.562 

8.151 

3.924 

4.437 

0.216 

8.276 

1.289 

1.246 

0.112 

0.256 



3.141 

1.193 

0.008 

4.694 

0.332 

0.032 

0.003 

4.626 

0.126 



0.174 
3.662 
8.189 
5.956 
14.169 
12.318 



0.23857 
0.28240 
0.10948 
0.00430 ** 
0.04760 * 
0.03517 * 
0.64189 
0.00402 ** 
0.25616 
0.26428 
0.73795 
0.61315 



0.07634 
0.27479 
0.92743 
0.03028 
0.56465 
0.85901 
0.95462 
0.03150 
0.72258 



0.67647 
0.05566 
0.00421 
0.01467 
0.00017 
0.00045 



* <= .05, ** <= .01 , *** <= (.05 / 27) 
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Table 5 

Polynomial Loglinear Model for the Equating 1993/1991 Dataset 





Uniform DIF 




Non-Uniform DIF 




d.f. chi-square 


P< 


d.f. chi-square 


P< 


item 










1 


1 0.351 


0.55333 


1 0.902 


0.34212 


2 


1 2.340 


0.12610 


1 0.003 


0.95570 


3 


1 3.725 


0.05360 


1 1 .267 


0.26031 


4 


1 1.005 


0.31609 


1 0.200 


0.65494 


5 


1 7.656 


0.00566 ** 


1 0.046 


0.83047 


6 


1 15.761 


0.00007 *** 


1 0.062 


0.80312 


7 


1 5.965 


0.01460 * 


1 0.000 


0.99691 


8 


1 0.881 


0.34798 


1 2.239 


0.13456 


9 


1 0.646 


0.42140 


1 0.927 


0.33572 


10 


1 3.019 


0.08228 


1 1.454 


0.22787 


11 


1 0.145 


0.70345 


1 0.231 


0.63048 


12 


1 9.378 


0.00220 ** 


1 2.930 


0.08697 


13 


1 0.059 


0.80771 


1 1 .005 


0.31601 


14 


1 15.918 


0.00007 *** 


1 0.053 


0.81859 


15 


1 5.355 


0.02067 * 


1 3.426 


0.06417 


16 


1 3.534 


0.06010 


1 1.342 


0.24663 


17 


1 1.634 


0.20116 


1 1 .003 


0.31648 


18 


1 5.064 


0.02443 * 


1 1.126 


0.28857 


19 


1 0.620 


0.43117 


1 0.031 


0.86057 


20 


1 0.176 


0.67469 


1 0.208 


0.64831 


21 


1 0.219 


0.63999 


1 0.114 


0.73533 


22 


1 9.056 


0.00262 ** 


1 1.060 


0.30321 


23 


1 0.000 


0.98877 


1 0.380 


0.53749 


24 


1 2.610 


0.10616 


1 2.513 


0.11291 


25 


1 7.707 


0.00550 ** 


1 1 .465 


*t AAA -2 

0.22613 


26 


1 7.672 


0.00561 ** 


1 0.292 


0.58902 


27 


1 1 .890 


0.16918 


1 1.196 


0.27403 


28 


1 1.265 


0.26070 


1 0.007 


0.93382 


29 


1 95.091 


0.00000 *** 


1 8.596 


0.00337 ** 


30 


1 24.799 


0.00000 *** 


1 7.184 


0.00735 ** 


31 


1 3.255 


0.07122 


1 0.972 


0.32408 


32 


1 0.291 


0.58980 


1 0.219 


0.63959 


33 


1 15.929 


0.00007 *** 


1 0.549 


0.45860 


34 


1 2.788 


0.09497 


1 1.178 


0.27771 


35 


1 0.784 


0.37600 


1 1.695 


0.19298 


36 


1 0.002 


0.96228 


1 0.013 


0.91011 


37 


1 1.613 


0.20402 


1 2.020 


0.15525 


38 


1 1.493 


0.22175 


1 0.200 


0.65452 



* <= .05, " <= .01, ***<= (.05/38) 
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Table 6 

Polynomial Loglinear Model for the Equating 1993/1992 Uataset 





Uniform DIF 




Non-Uniform DIF 




d.f. Chi-square 


P< 


d.f. Chi-square 


P< 


item 










1 


1 5.209 


0.02248 * 


1 0.664 


0.41532 


2 


1 7.616 


0.00578 ** 


1 2.941 


0.08634 


3 


1 1.198 


0.27376 


1 0.012 


0.91144 


4 


1 1.801 


0.17962 


1 2.738 


0.09798 


5 


1 0 280 


0.59663 


1 0.172 


0.67793 


6 


1 0.237 


0.62672 


1 0.007 


0.93153 


7 


1 0.349 


0.55443 


1 0.908 


0.34053 


8 


1 0.828 


0.36274 


1 0.108 


0.74214 


9 


1 9.076 


0.00259 ** 


1 3.238 


0.07196 


10 


1 0.410 


0.52216 


1 0.565 


0.45223 


11 


1 2.688 


0.10114 


1 0.000 


0.99542 


12 


1 0.181 


0.67060 


1 0.688 


0.40675 


13 


1 0.097 


0.75553 


1 0.578 


0.44710 


14 


1 3.161 


0.07543 


1 1.346 


0.24600 


15 


1 0.452 


0.50118 


1 3.973 


0.04623 * 


16 


1 0.341 


0.55918 


1 0.635 


0.42564 


17 


1 6.886 


0.00869 ** 


1 1.602 


0.20563 


18 


1 3.904 


0.04816 * 


1 3.015 


0.08251 


19 


1 1.476 


0.22433 


1 0.111 


0.73919 


20 


1 0.605 


0.43657 


1 0.807 


0.36892 


21 


1 1.680 


0.19494 


1 0.110 


0.74022 


22 


1 0.738 


0.39025 


1 1.284 


0.25708 


23 


1 0.971 


0.32445 


1 1.460 


0.22689 


24 


1 0.231 


0.63087 


1 1.466 


0.22594 


25 


1 0.954 


0.32868 


1 0.269 


0.60374 


26 


1 5.595 


0.01801 * 


1 0.463 


0.49606 


27 


1 0.450 


0.50239 


1 3.515 


0.06081 


28 


1 2.687 


0.10119 


1 4.696 


0.03023 * 


29 


1 6.081 


0.01367 * 


1 0.028 


0.86797 


30 


1 6.794 


0.00915 ** 


1 0.250 


0.61731 


31 


1 0.229 


0.63260 


1 1.012 


0.31444 


32 


1 0.C17 


0.89563 


1 0.358 


0.54954 


33 


1 1.099 


0.29447 


1 0.013 


0.90782 


34 


1 1.753 


0.18544 


1 0.003 


0.95455 


35 


1 3.164 


0.07528 


1 0.148 


0.70059 


36 


1 3.621 


0.05705 


1 0.211 


0.64597 


37 


1 0.224 


0.63598 


1 0.767 


0.38105 



* <= .05, " <= .01. *** <= (.05 / 37) 
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